Entropy Manipulation of Arbitrary Nonlinear Mappings - Neural Networks for Signal Processing [1997] VII. Proceedings of the 1997 IEEE Workshop
نویسنده
چکیده
We discuss an unsupervised learning method which is driven by an information theoretic based criterion. Information theoretic based learning has been examined by several authors Linsker [2, 31, Bell and Sejnowski [5], Deco and Obradovic [l], and Viola et aZ[6]. The method we discuss differs from previous work in that it is extensible to a feed-forward multi-layer perceptron with an arbitrary number of layers and makes no assumption about the underlying PDF of the input space. We show a simple unsupervised method by which multi-dimensional signals can be nonlinearly transformed onto a maximum entropy feature space resulting in statistically independent features. 1.0 INTRODUCTION Our goal is to develop mappings that yield statistically independent features. We present here a nonlinear adaptive method of feature extraction. It is based on concepts from information theory, namely mutual information and maximum crossentropy. The adaptation is unsupervised in the sense that the mapping is determined without assigning an explicit target output, a priori, to each exemplar. It is driven, instead, by a global property of the output: cross entropy. There are many mappings by which statistically independent outputs can be obtained. At issue is the usefulness of the derived features. Towards this goal we apply Linsker’s Principle of Information Maximization which seeks to transfer maximum information about the input signal to the output features. It is also shown that the resulting adaptation rule fits naturally into the back-propagation method for training multi-layer perceptrons. Previous methods [ 11 have optimized entropy at the output of the mapping by considering the underlying distribution at the input. This represents a complex problem for general nonlinear mappings. The method presented here, by contrast, is more directly related to the technique of Bell and Sejnowski [5] in which we manipulate 0-7803-4256-9/97/$10.00 0 1 997 IEEE 14 entropy through observation at the output of the mapping. Specifically, we exploit a property of entropy coupled with a saturating nonlinearity which results in a method for entropy manipulation that is extensible to feed-forward multi-layer perceptrons (MLP). The technique can be used for an MLP with an arbitrary number of hidden layers. As mutual information is a function of two entropy terms, the method can be applied to the manipulation of mutual information as well. In section 2 we discuss the concepts upon which our feature extraction method is based. We derive the adaptation method which results in statistically independent features in section 3. An example result is presented in section 4, while our conclusions and observations appear in section 5 . 2.0 BACKGROUND The method we describe here combines cross entropy maximization with Parzen window probability density function estimation. These concepts are reviewed. 2.1 Maximum entropy techniques have been applied to a host of problems (e.g. blind separation, parameter estimation, coding theory, etc.). Linsker [2] proposed maximum entropy as a self-organizing principle for neural systems. The basic premise being that any mapping of a signal through a neural network should be accomplished so as to maximize the amount of information preserved. Linsker demonstrates this principle of maximum information preservation for several problems including a deterministic signal corrupted by gaussian noise. Mathematically Linsker's principle is stated Maximum Entropy as a Self-organizing Principle I ( X > Y ) = hy(V>-hy/,y(Ylx) (1) where I ( x , y ) is the mutual information of the RVs X and Y, and hiJ 1) is the continuous entropy measure [4]. Given the RV (random vector), Y E 'Ji , the continuous entropy is defined as
منابع مشابه
Complete memory structures for approximating nonlinear discrete-time mappings
This paper introduces a general structure that is capable of approximating input-output maps of nonlinear discrete-time systems. The structure is comprised of two stages, a dynamical stage followed by a memoryless nonlinear stage. A theorem is presented which gives a simple necessary and sufficient condition for a large set of structures of this form to be capable of modeling a wide class of no...
متن کاملEntropy Manipulation of Arbitrary Nonlinear Mappings
We discuss an unsupervised learning method which is driven by an information theoretic based criterion. Information theoretic based learning has been examined by several authors Linsker [2, 3], Bell and Sejnowski [5], Deco and Obradovic [1], and Viola et al [6]. The method we discuss differs from previous work in that it is extensible to a feed-forward multi-layer perceptron with an arbitrary n...
متن کاملA delay damage model selection algorithm for NARX neural networks
Recurrent neural networks have become popular models for system identiication and time series prediction. NARX (Nonlinear AutoRegressive models with eXogenous inputs) neural network models are a popular subclass of recurrent networks and have been used in many applications. Though embedded memory can be found in all recurrent network models, it is particularly prominent in NARX models. We show ...
متن کاملCross Entropy-Based High-Impedance Fault Detection Algorithm for Distribution Networks
The low fault current of high-impedance faults (HIFs) is one of the main challenges for the protection of distribution networks. The inability of conventional overcurrent relays in detecting these faults results in electric arc continuity that it causes the fire hazard and electric shock and poses a serious threat to human life and network equipment. This paper presents an HIF detection algori...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004